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ON  THE  CALCULATION  OF  MUTUAL  INFORMATION 


Tyrone  E.  Duncan* 

1 .  Introduction 

Calculating  the  amount  of  information  about  a  random  function  con¬ 
tained  in  another  random  function  has  important  uses  in  communication 
theory.  An  expression  for  the  mutual  information  for  continuous  time 
random  processes  has  been  given  by  Gelfand  and  Yaglom  [l],  Chiang  [2] 
and  Perez  [3]  by  generalizing  Shannon's  result  [4]  in  a  natural  way. 
Under  a  condition  of  absolute  continuity  of  measures  the  continuous  time 
expression  has  the  same  form  as  Shannon's  result.  For  two  Gaussian 
processes  Gelfand  and  Yaglom  express  the  mutual  information  in  terms  of 
a  mean  square  estimation  error.  We  generalize  this  result  to  diffusion 
processes  and  express  the  solution  in  a  different  form  which  is  more 
naturally  related  to  a  corresponding  filtering  problem.  We  also  use  these 
results  to  calculate  some  information  rates. 

2.  Problem  Statement 

We  shall  consider  two  random  processes  which  are  the  solutions  of 
the  following  stochastic  differential  equations 


dX  =  a(t,Xt)dt  +  b(t,Xt)dBt 

(1) 

rsj 

dYt  =  c(t,  Xt)dt  +  h(t)dB 

(2) 

where  the  solution  is  obtained  for  the  interval  [0,  1]  and  (for  notational 
simplicity)  Xo  =  0  and  Y©  =  0.  The  processes  {xj  and  { Y  }  are  n  and  m 
dimensional  respectively  and  the  processes  { B^}  and  { B^}  are  n  and  m  di¬ 
mensional  standard  Brownian  motions.  The  elements  of  the  vectors  a  and 
c  and  the  matrices  b  and  h  are  continuous  in  t  and  globally  Lipschi/cz  con¬ 
tinuous  in  X^..  The  inverse  of  the  matrix  h(t),  h  1  (t) ,  exists  and  is  continu¬ 
ous  for  all  t  e  [0,  l].  We  wish  to  calculate  the  amount  of  information  in  the 
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process  {y^}  about  the  process  {Xt}, 


3 .  Preliminaries 

In  order  to  calculate  the  mutual  information  we  must  determine 
some  appropriate  Radon-Nikodym  derivatives.  We  shall  use  the  following 
result  due  to  Girsanov  [5]  . 

Theorem  1 :  Suppose  that 

dXt  =  a(t,  X  )dt  +  b(t,  Xt)dBt 

dYt  (a(t„  Yt)  +  b(t,  Yt)h(t,  Yt»dt  +  b(t,  Yt)dBt 


where 


i)  tcfs.l],  X{s)  =  Y(s) 

ii)  a  and  h  are  n  vectors  and  b  is  an  n  X  n  matrix 

iii)  a(  • ,  • ) ,  b(  • ,  )  and  h(  • ,  • )  are  measurable  in  both  variables. 

In  particular  a  and  b  are  continuous  in  their  first  variable  and 
globally  Lipschitz  continuous  in  their  second  variable. 


i 

y  |h(t,  Xt)| 2dt  <  °o  a.s. 
s 

v)  |h(t,Xt)|  <  ho(sup|Xt|) 

where  h0  is  a  nondecreasing  function  of  a  real  variable. 
Then  the  measures  and  induced  on  C^fs,  1]  (the  space  of  all  con¬ 
tinuous  functions  with  values  in  by  {x^}  and  { Y^}  respectively,  are 
mutually  absolutely  continuous . 

The  Radon-Nikodym  derivative  dp  /dp  is  given  by 

Y  X 


=  exp  j\iT(u,  Xu)dBu  -  j  j  |h(u,  Xu)|2du 


We  shall  also  use  a  result  of  Duncan  [  6]  giving  the  expression  for 
the  likelihood  function  of  a  related  detection  problem. 


Theorem  2:  Consider  the  following  detection  problem 


dYt  =  c(t,Xt)dt  +h(t)dBt 
=  h(t)dBt 


for  signal  present 
for  signal  not  present 


(4) 


where  is  the  solution  of  (1)  with  the  assumptions  indicated  there 

Then  the  Radon -Nikodym  derivative,  At,  for  this  detection  problem 
is  given  by 


A.  =  E  iexpf  fc  (u,  X  )h  dB  ■  4  Tc  (u,  X  )g  c(u,  X  )du  l 
t  (l  <  [J  u  u  u  u  &u  u  JJ 

A  0  0 

r  t  T  t 

'  It  =  exp[j’&T(u.Xu)h;1  dBu  -i  j'sT(u,Xu)g;‘6(u,Xu)du^ 


(5) 


where  E  corresponds  to  integration  with  respect  to  the  measure  ^ 

-  PX -  X 

generated  by  the  solution  of  (1),  &(t,  Xt)  =  E[c(t,  X^}  Y^,  0  s  u  S  t]  (the  con- 

ditional  expectation  of  c(t,  X  )  given  the  augmented  Borel  field  generated 

*  rp 

by  {y^,  0<u£t}),  andg  =  nh. 

Generalizations  of  Shannon's  mutual  information  have  been  dis¬ 
cussed  by  Gelfand  and  Yaglom  [1].  Chiang  [ 2 ]  and  Perez  [3]  They  obtain 
the  following  result  as  the  natural  extension  of  Shannon's  mutual  informa¬ 
tion. 


Theorem  3  :  Let  £  and  be  two  random  vectors .  The  mutual  information 
J(£.  ri)  is  given  by 


^l)  8  j  a(x1.y)logQ'{x,  yJdP^xldP^y) 

where 

dPt  U,y) 

a(x’y)  =  dP^(x)dP^(y) 


(6) 
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4.  Main  Result 


We  have  now  established  sufficient  preliminaries  to  obtnin  the 
main  result  of  this  paper 

Theorem  4:  Consider  the  processes  {x^}  and  {v  }  obtained  as  solutions 
of  (1)  and  ( 2) .  The  mutual  information  contained  in  { Y  ,  0  s  u  <  1  }  about 
{Xu,  0  i  u  ^  l}  is  given  by  the  following  expression 

i  -p 

J(X,  Y)  =tE  j[c(u,  X  )  -  £(u,  X  )]  g_1(u)[c(u,  X  )  -c(u,  X  )]du  (7) 

v/  U  U  U  U 


where  c  is  defined  in  Theorem  2. 

Proof:  Tc  calculate  the  mutual  information  we  must  compute  an  appro¬ 
priate  Radon-Nikodym  derivative.  Let 


d^XdfXY 

where  p  is  the  product  measure  generated  by  (1)  and  (2)  and  n  and 
XY  X 

p Y  are  ^e  marginals.  By  a  simple  calculation,  essentially  using  only 
the  absolute  continuity  results  of  Theorems  1  and  2  we  have 

d|1XY 

d>*Xd,*Y  ^ 

where  ib  =  E  [ti/J  is  given  in  Theorem  (2)  for  the  detection  problem, 
t  \x  t 

Thus 

J(X,  Y)  =  j$log$dpxdpy 

C  T  i 

log*  =  j  [c(t,Xt)  -  £(t,Xt)]  gt  dYt 

-4j'[cT(t,Xt)g~1  c(t.Xt)  -  ^T(t.Xt)gt'^(t,Xt)]dt 

Substituting  dY^  =  c(t,  Xt)dt  +  h(t)dBt  and  using  the  fact  that  the  integral 

(N/ 

with  respect  to  B^.  is  a  martingale  we  have 
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J(X,  Y)  =  E  j  ][cT{t,Xt)gt_1  c(t,Xt)  -  ^T(t,Xt)gt'lc(t,Xt)]dt 


-Ty[cT(t,Xt)gt_1c(t,Xt)  -  ^T(t.Xt)gt"Ic(t.Xt)]dt| 

=  iE ^{c  -  c)Tg  1  (c  -  c) dt  | 

Remark  1.  This  result  is  in  a  different  form  from  that  obtained  by  Gel- 
fand  and  Yaglom  for  Gaussian  processes  but  by  using  some  resolvent 
identities  obtained  by  Siegert  [7]  we  can  show  the  equivalence  of  the  two 
results  and  the  relation  of  Siegert' s  work  to  the  recursive  linear  filter¬ 
ing  of  Kalman  and  Bucy  [8]. 

Remark  2.  Some  obvious  simple  extensions  of  the  above  result  are,  for 
example,  letting  c(t,  Xt)  be  a  function  of  Y^  -  actually  c  can  be  a  functional 
of  the  past  of  both  the  processes  {Xt}  and  {y^}  with  a  suitable  function 
space  Lipschitz  assumption  (of  K.  Ito  and  Nisio  [9]). 

Remark  3.  If  we  lei  c(t,  X^)  =  Xt  and  consider  the  process  {y^}  as  obser¬ 
vations  of  the  process  {x^.}  in  noise,  then  twice  the  mutual  information  is 
merely  the  integral  of  the  trace  of  the  optimal  mean  square  filtering  error 
for  estimating  {x^}  from  {y^}.  In  the  one  dimensional  case  the  "new 
dsia"(dBt)  is  weighted  according  to  the  additional  amount  of  conditional 
mutual  information  it  possesses  (cf  Kushner  [10]  or  Duncan  [11]) . 

5 .  An  Application  to  Information  Rate 

We  shall  consider  a  Gaussian  problem  in  more  detail  and  obtain 
some  results  for  information  rate.  These  results  extend  and  simplify 
seme  results  of  Gelfand  and  Yaglom  [1]  and  indicate  rates  of  convergence 
for  some  of  their  approximations.  The  methods  used  here  require  only 
time-domain  techniques  which  indicate  more  clearly  the  necessary  pro¬ 
perties  for  the  existence  of  the  information  rates. 

First,  though,  we  give  the  definition  that  we  shall  use  for 


5 


information  rate  (cf  Gelfand  and  Yaglom  [1]  or  Pinsker  [12]). 

Definition:  The  rate  of  generation  of  information  about  a  process  t)  by  a 
process  £  is 

T(£,n)  =  lim  ~  Jftj.  *1?)  (8) 

T**°0  A 

XT  — 

where  £0  and  t)0  denote  the  processes  on  the  interval  [0,  T]  and  I  is  only 

defined  when  the  limit  exists. 

We  shall  obtain  a  result  for  the  exists  ce  of  information  rate  in 
terms  of  some  system  theory  results.  This  result  will  indicate  some 
useful  bounds  on  approximations  that  one  obtains  by  using  finite  time  cal¬ 
culations  for  information  rate. 

We  shall  calculate  the  rate  of  generation  of  information  about  a 
Gaussian  process  {xj-  by  another  Gaussian  process  {yJ-,.  Specifically 
we  have  the  following  equations 

dX  =  a(t)X  dt  +  b(t)dBf  (9) 

dY  =  f(t)Xtdt  +  dB  (10) 

where  { and  {b^}  are  independent  n  and  m  dimensional  Brownian  mo¬ 
tions  respectively.  We  shall  assume  also  that  the  matrices  a,  b,  and  f 
have  elements  which  are  continuous  functions  of  t.  The  interval  of  solu¬ 
tion  is  the  half  line  [0,  «) .  The  initial  conditions  are  Xo  =  a,  a  zero  mean 
Gaussian  random  vector,  and  Y0  =  0. 

We  shall  also  consider  the  case  where  the  coefficients  of  the  sto¬ 
chastic  differential  equations  (9)  and  (10)  are  not  functions  of  time.  In 
this  case  we  shall  use  the  same  symbols  for  the  coefficients  deleting  the 
variable  t,  i.e., 


a. 

X 

=  aX  dt  + 

bdBt 

(11) 

dY 

t 

=  fXtdt  + 

dBt 

(12) 
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where  the  appropriate  assumptions  for  (9)  and  (10)  are  still  in  effect. 
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Whenever  the  process  {Yt}  is  a  process  of  observations  from  which 
we  wish  to  obtain  a  best  mean  square  estimate  of  the  process  {x^},  then 
we  have  a  well  known  filtering  problem .  In  fact,  by  Theorem  4  the  mutual 
information  for  (9)  and  (10)  (or  (11)  and  (12))  is  obtained  from  the  integral 
of  the  trace  of  the  error  covariance  matrix  for  this  filtering  problem. 

What  we  intend  to  show  is  that  this  mean  square  error  converges  to  a 
steady-state  solution  which  will  then  give  us  the  appropriate  information 
rate. 

In  the  subsequent  discussion  we  shall  use  the  following  definitions 

[8,13], 

Definition:  The  system  (9)  and  (10)  is  uniformly  completely  observable 
if  there  exist  fixed  positive  constants  or,  a  and  p  such  that 

0  <  al  *  M(t  -  or,  t)  5  pi 

for  all  t  where 

M(tj ,  t2)  =  J  $T(t,  tz)fT(t)f(t)$(t,  t2)dt  (13) 

ti 


For  symmetric  matrices  A  <  B(A  ^  B)  implies  that  A  -  B  is  positive  defi¬ 
nite  (non-negative  definite).  The  matrix  $  is  the  transition  matrix  for  the 
ordinary  differential  equation 


dX 

dt 


=  a(t)X 


Definition:  The  system  (9)  and  (10)  is  uniformly  completely  controllable 
if  there  exist  fixed  positive  constants  <r,  a,  p  such  that 

0  <  al  £  W(t  -  a,  t)  S  pi 

where 


I 


\ 
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(14) 


\ 


*2 

W(ti ,  t2)  =  j* $(t2 .  t)b(t)  bT(t)*T(t2,  t)dt 
ti 

Definition:  The  system  (11)  and  (12)  is  completely  observable  if  the  matrix 
M(tj,t2)  (ef  eq.  (13))  is  positive  definite. 

Definition:  The  system  (11)  and  (12)  is  completely  controllable  if  the  matrix 
W(t,,t2)  (cf  eq.  (14))  is  positive  definite. 

Assuming  that  the  system  (9)  and  (10)  is  uniformly  completely  con¬ 
trollable  and  uniformly  completely  observable,  Kalman  and  Bucy  [8]  and 
Kalman  [13]  have  shown  that  with  an  arbitrary  initial  covariance  for  X0 
the  conditional  error  covariance  for  the  filtering  problem  (9)  and  (1  0)  is 
bounded  and  converges  uniformly  and  exponentially  to  a  unique  solution. 

For  the  system  (11)  and  (12),  assuming  complete  controllability  and 
complete  observability,  the  conditional  error  covariance  converges  uni¬ 
formly  to  a  constant  matrix  which  is  the  unique  positive  definite  equilib¬ 
rium  state  of 

^  =  aP  +  PaT  -  PfTfP  +  bbT  (15) 

dt 

which  is  the  optimal  matrix  mean  square  error  for  the  Wiener- Kolmogorov 
solution  to  the  filtering  problem  (11)  and  (12)  given  the  infinite  past 
{Y^,  -oo  <  u  £  t}  .  With  these  results  it  is  easy  to  obtain  the  following: 

Proposition:  Given  that  the  system  (9)  and  (10)  is  uniformly  completely 
controllable  and  uniformly  completely  observable  then  the  rate  of  gene r a 
tion  of  information  about  the  process  {x^.}  by  the  process  { exists  and 
i s  one  half  the  trace  of  the  steady-state  covariance  error  for  the  fil t e r i n g 
problem  for  (9)  and  (10) . 

Corollary:  Given  that  the  system  (11)  and  (12)  is  completely  controllable 
and  completely  observable  then  the  rate  of  generation  of  information  about 
the  process  {x^}  by  the  process  { Y^}  exists  and  is  one  half  the  trace  of  the 
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optimal  matrix  mean  square  error  for  the  Wiener -Kolmogorov  solution  to 
the  filtering  problem  (11)  and  (12). 


Remark  1 .  We  can  also  calculate  mutual  information  and  information  rate 
when  the  noise  processes  {b  }  and  {b  l  are  correlated  and  with  appropri- 

l  t 

ate  absolute  continuity  conditions  we  can  make  calculations  for  "smooth'' 
noise  processes. 

Remark  2.  From  the  convergence  properties  for  the  conditional  error 
covariance,  rates  of  convergence  can  be  given  for  some  problems  that 
Gelfand  and  Yaglom  [1]  consider  of  information  and  information  rate  about 
a  stationary  process  over  a  finite  interval,  contained  in  a  sum  of  this  pro¬ 
cess  and  white  noise  when  the  interval  is  allowed  to  became  unbounded. 
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